The Plant Genome — Latest Matching Preprints

1

Leveraging whole-genome re-sequencing for diversity, population structure, and a public mid-density genotyping enrichment panel in crimson clover (Trifolium incarnatum L.) for breeding purposes

Castillo, M. P.; Oyebode, O. G.; Talag, J.; Bunting, V.; Kaur, N.; Doran, P.; Barry, K.; Schmutz, J.; Schlautman, B.; Humphries, A.; Ghamkhar, K.; Moore, V.; Rios, E.; Harkess, A.; Wolfe, M.

2026-01-28 genomics 10.64898/2026.01.26.701603 medRxiv

Top 0.1%

50.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWCrimson clover (Trifolium incarnatum L.) is an obligately outcrossing, cool-season annual legume valued for forage and cover cropping, yet genomic resources to support systematic improvement are limited. We performed the first and most comprehensive whole-genome resequencing (WGR) of global crimson clover germplasm to (i) characterize diversity and population structure and (ii) develop a public mid-density enrichment capture panel for breeding applications. A core set of 45 accessions sequenced at [~]50X generated 5.84 million variants, while 149 additional accessions sequenced at [~]2.54X yielded 17.05 million variants. After stringent filtering, we retained 542,790 high-confidence SNPs from the high-coverage dataset and [~]2.4 million from the low-pass cohort. Population analyses (PCA, ADMIXTURE) revealed compact clustering of cultivars, broader dispersion of wild and uncertain-status accessions, and low overall differentiation (FST = 0.0105) with excess heterozygosity (FIS = -0.0592), consistent with obligate outcrossing. Guided by these resources, we designed a 28,913-SNP TWIST hybrid-capture panel enriched for genic regions and evenly distributed across seven chromosomes. This panel is being deployed within Auburn Universitys crimson clover breeding program to support population improvement and cultivar development. The resulting genomic resources provide a reproducible, mid-density genotyping platform for trait discovery, predictive breeding, and diversity monitoring. Together, these advances bring crimson clover genomic resources on par with other legumes such as soybean (Glycine max (L.) Merr.) and alfalfa (Medicago sativa), establishing a robust foundation for genomics-assisted improvement of this key cover and forage crop in U.S. sustainable agriculture. CORE IDEASO_LIWhole-genome re-sequencing of 194 crimson clover accessions revealed >21 M variants. C_LIO_LIHigh-confidence SNP catalogs from 50X and 2X data enable cost-effective genotyping. C_LIO_LIGenetic diversity is weakly structured, with cultivars clustering narrowly by origin. C_LIO_LIA 28,913 SNP enrichment panel delivers uniform genome coverage and >75% genic content. C_LIO_LIThese genomic tools accelerate GWAS, genomic selection, and breeding innovation. C_LI

2

Genomic selection for seed yield enhances flax breeding efficiency

You, F. M.; Zheng, C.; Zagariah Daniel, J. J.; Li, P.; Jackle, K.; House, M.; Tar'an, B.; Cloutier, S.

2026-03-03 genomics 10.64898/2026.03.01.707406 medRxiv

Top 0.1%

40.2%

Show abstract

Genomic selection (GS) is a promising strategy to improve breeding efficiency for complex traits such as seed yield by enabling early selection and reducing reliance on extensive field testing. However, practical deployment of GS remains challenging due to limited training populations sizes and reduced prediction accuracies when models are applied to true breeding germplasm. In this study, we evaluated GS for flax (Linum usitatissimum L.) seed yield under realistic breeding scenarios, with a focus on across-population prediction (APP) and breeding decision support rather than model benchmarking. Using historical germplasm collections and a newly developed breeding-oriented population as training sets, GS performance was assessed across multiple independent test populations representing contemporary breeding lines evaluated in replicated yield trials. APP accuracies reached r = 0.84 when training and test populations were genetically aligned, supporting routine breeding deployment. Training population composition emerged as a key determinant of prediction success, with breeding-oriented populations consistently outperforming broad germplasm collections for predicting true breeding lines. Check-based selection analyses showed that GS reliably reproduced phenotypic advancement decisions while eliminating 61-91% of low-performing lines, resulting in 48-78% reduction in field evaluation costs for a typical cohort of 300 lines. Marker subsampling analyses further indicated that moderate-density genotyping-by-sequencing panels ([~]2,500-3,000 SNPs) are sufficient to achieve stable prediction accuracies. Overall, these results demonstrate that GS for seed yield in flax is ready for routine integration into breeding programs, offering a practical pathway to reduce costs, accelerate breeding cycles, and enhance selection efficiency.

3

Genomic prediction of single cross families of perennial ryegrass in two nitrogen managements

Santos Junior, D. R. d.; Fe, D.; Lenk, I.; Jensen, C. S.; Asp, T.; Janss, L.; Bornhofen, E.

2026-05-08 genomics 10.64898/2026.05.05.722839 medRxiv

Top 0.1%

38.4%

Show abstract

The performance of a single cross is determined by the average additive effects of the parents, as well as the interactions between them. These quantities can be estimated using an appropriate genetic design, allowing for the estimation of general (GCA) and specific (SCA) combining abilities. The prediction of GCA for new parents and the total genetic value of unrealized crosses can be made when genome-wide marker information is available. Several studies in crops such as maize and rice have demonstrated the potential of genomic-assisted prediction of single-cross performance in economically important crops. However, no study to date has explored its relevance in perennial ryegrass, an obligate allogamous species that is bred in genetically heterogeneous families. In this study, we aimed to estimate genetic parameters and assess the ability of genomic models to predict the performance of F2 families in terms of dry matter yield and nutritive quality traits. We used data from a large partial diallel involving 104 parents from two distinct subpopulations, as inferred by admixture analysis. F2 families were evaluated in multiple environments and under two nitrogen availability conditions. Genotyping-by-sequencing of the parent plants produced 42,145 variants after quality control, which were used to estimate genomic relationships based on identity-by-state. Variance component estimation revealed limited GCA and SCA interactions with the environment, and particularly with nitrogen management. The predictive abilities of two parental models exceeded 0.60 and often surpassed 0.70 for most traits. However, incorporating non-additive effects into the model did not improve predictive ability. We leveraged the genetic diversity among parents to map genomic regions associated with all recorded traits. Genome-wide association studies (GWAS) by genomic best linear unbiased prediction (GBLUP) identified six quantitative trait loci (QTL) regions, with 45 candidate genes within the linkage disequilibrium range, estimated at approximately 92 kb. Our results demonstrate that genomic prediction of single crosses can be performed with high accuracy, especially when both parents are also progenitors of families in the training set.

4

Ensembles of Graph Neural Networks Supervised by Genotype-to-Phenotype Structures Improved Genomic Prediction Performance

Tomura, S.; Powell, O. M.; Wilkinson, M. J.; Cooper, M.

2026-01-06 genomics 10.64898/2025.12.21.695855 medRxiv

Top 0.1%

34.8%

Show abstract

Accurate selection of favourable crop genotypes has motivated the exploration of diverse prediction algorithms for crop breeding applications. One genomic prediction method that has not been fully explored is graph attention networks (GAT). By directly analysing graphical data with the attention mechanism, GAT can incorporate the genotype-to-phenotype (G2P) structure to regularise predictions. As one potential G2P structure, a gene network can be inferred from interpretable machine learning models to effectively learn key features of prediction patterns, potentially improving prediction performance. Here, we investigated whether incorporating such data-driven prior knowledge into GAT improved prediction performance compared to GAT models representing a continuum of G2P structures, ranging from infinitesimal to fully connected. Applying the Diversity Prediction Theorem, we also combined these diverse G2P structures into an ensemble of GAT genomic prediction models to integrate complementary strengths of multiple models. The results for flowering time traits in two maize nested association mapping datasets showed a lack of consistent performance improvement in the data-driven prior knowledge GAT model. However, consistent outperformance was observed for the ensemble of GAT models. Improved predictions from the ensemble model may be driven by its ability to capture a more complete representation of the inferred gene network through the integration of information from diverse G2P structures. The observed results using the GAT methodology provided the foundation for potential performance improvement using GAT by integrating biological prior knowledge derived from omics data and empirically verified gene interactions in future research, thereby potentially enhancing the GAT ensemble performance.

5

Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv

Top 0.1%

34.2%

Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

6

Phenotyping replication is a major determinant of genomic predictive ability in sweet sorghum (Sorghum bicolor Moench)

CHARLES, J. R.; Rice, B.; Tovignan, T.; Morris, G. P.; Pressoir, G.

2026-06-19 genomics 10.64898/2026.06.15.731123 medRxiv

Top 0.1%

34.0%

Show abstract

Genomic selection can increase the rate of genetic gain in crop breeding programs, but its effectiveness depends on the reliability of phenotypic data, the size and composition of the training population (TP), and the statistical model used to estimate genomic breeding values. These design choices are especially important in resource-limited breeding programs, where additional replication, larger TPs, and more extensive genotyping compete for the same resources. Using empirical data from a sweet sorghum [Sorghum bicolor (L.) Moench] breeding population, developed by CHIBAS, we evaluated the effects of phenotyping replication, TP size, training-validation genomic relatedness, and genomic prediction (GP) model on predictive ability (PA). Grain yield, plant height, stem weight, and total soluble solids were evaluated across three field environments. Few studies in sorghum have examined these factors together with comparable empirical rigor. Increasing replication improved genomic heritability and PA for all traits and environments, with the largest gains observed for grain yield. Larger TPs and increased training-validation genomic relatedness also improved PA, but their effects were most significant when phenotype estimates were based on multiple replicates. GP models showed largely comparable PAs across all evaluated traits. Different models produced similar PA, with a few exceptions. These findings provide practical guidance for optimizing genomic selection in resource-limited sorghum breeding programs. ARTICLE SUMMARYGenomic selection can accelerate breeding only when the phenotypes used to train prediction models have high reliability. Using a sweet sorghum breeding population evaluated in three Haitian field environments, we quantified how replication number, training population size, training-validation genomic relatedness, and prediction model affected genomic predictive ability for grain yield, plant height, stem weight, and total soluble solids. Replication increased genomic heritability and predictive ability for all traits, with the strongest effects for grain yield. Larger and more connected training populations improved prediction, mainly when replication was adequate. These results provide practical guidance for resource-limited breeding programs. Core ideasO_LIIn this empirical sweet sorghum breeding population, phenotyping replication was the dominant factor explaining variation in genomic predictive ability across traits and environments. C_LIO_LIThe benefit of larger training populations and greater training-validation genomic relatedness increased when phenotype estimates were based on more replicates. C_LIO_LIGrain yield, the most environmentally sensitive trait evaluated, showed the largest response to improved replication and training-population design. C_LIO_LIBayesian models, rrBLUP, and GBLUP showed similar predictive abilities across traits and environments, suggesting that phenotyping and experimental design may be more important than model complexity. C_LI

7

Genome-Wide Markers Predict Metribuzin Tolerance in Southern Soft Red Winter Wheat

Sellani, J.; Anzueto, H.; Arcenaux, K.; Price, P. T.; Brown-Guedira, G.; Harrison, S.; DeWitt, N.

2026-07-03 genomics 10.64898/2026.06.28.733875 medRxiv

Top 0.1%

33.9%

Show abstract

Metribuzin is a versatile herbicide effective against various annual grasses and broadleaf weeds found in wheat fields. However, it can cause foliar damage to wheat, impacting plant health and yield. A clearer understanding of the genetic architecture associated with metribuzin tolerance is necessary to guide marker-based breeding strategies. This study evaluated 351 historic Gulf Atlantic Wheat Nursery (GAWN) wheat breeding lines representative of southern US soft red winter wheat (SRWW) germplasm. Field trials were conducted at Winnsboro (WN) and Baton Rouge (BR), Louisiana, in 2016 and 2017. Metribuzin was applied at specific growth stages[DN1.1], and tolerance was assessed based on visual foliar damage. Genomic data from 6,252 filtered single nucleotide polymorphism (SNP) markers were used to estimate narrow-sense heritability, conduct genome-wide association (GWAS), and assess genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP). Broad-sense heritability ranged from 0.54 to 0.69 within environments and reached 0.77 across environments, while narrow-sense heritability ranged from 0.35 to 0.47, indicating moderate additive genetic control. No SNP surpassed the significance threshold, but genomic prediction (GP) showed moderate to strong predictive ability (PA) across environments, with the highest accuracy (r = 0.62) observed between BR17 and WN17. These results indicate that metribuzin tolerance in SRWW is primarily controlled by multiple small-effect loci and that GS provides a more effective breeding strategy than marker-assisted selection for improving tolerance in southern wheat germplasm.

8

Development and evaluation of a cost-effective, mid-density SNP array as a sorghum community genotyping resource

Kumar, V.; Klein, R. R.; Kaufman, B.; Winans, N. D.; Crozier, D.; Rooney, W. L.; Harrison, M.; Hayes, C.; Tello-Ruiz, M. K.; Gladman, N. P.; Olson, C.; Burow, G.; Sexton-Bowser, S.; Punnuri, S.; Knoll, J.; Dahlberg, J.; Ware, D.

2026-02-23 plant biology 10.64898/2026.02.20.706663 medRxiv

Top 0.1%

33.6%

Show abstract

The development of accessible and cost-effective genotyping platforms is essential to accelerate genetic gain in crop improvement. To address the U.S. sorghum communitys need for a standardized, mid-density genotyping resource, we developed and validated a targeted single-nucleotide polymorphism (SNP) array using the PlexSeq next-generation sequencing (NGS) platform. The resulting genotyping array includes 2,421 SNPs spanning all ten Sorghum bicolor chromosomes and integrates trait-linked and quality control markers selected by public and private stakeholders. Genotyping 2,726 diverse accessions, including the Sorghum Association Panel (SAP), demonstrated high call rates (>90% for most samples and markers), low missing data, and accurate resolution of population structure consistent with prior whole-genome studies. In comparative genomic prediction analyses, the mid-density array performed equivalently to high-density genotype-by-sequencing (GBS) platforms for key traits such as grain yield and plant height across multi-environment trials. Designed for broad utility in breeding pipelines, the array enables marker-assisted selection, genomic prediction, identity verification, and germplasm quality control. Moreover, its adoption by the USDA National Plant Germplasm System facilitates the curation of genebanks and the management of core collections. This community-driven genotyping platform offers a scalable, reproducible, and customizable tool to support molecular breeding in sorghum and underscores the value of targeted marker systems in resource-optimized crop improvement programs.

9

MultiGS: A comprehensive and user-friendly genomic prediction platform Integrating statistical, machine learning, and deep learning models for breeders

You, F.; Zheng, C.; Daniel, J. J. Z.; Li, P.; Taran, B.; Cloutier, S.

2026-01-02 bioinformatics 10.64898/2026.01.02.697306 medRxiv

Top 0.1%

33.2%

Show abstract

Genomic selection (GS) is a core strategy in modern breeding programs, yet the rapid expansion of statistical, machine-learning (ML), and deep-learning (DL) models has made systematic evaluation and practical deployment increasingly challenging. To address these issues, we developed MultiGS, a unified and user-friendly framework that integrates linear, ML, DL, hybrid, and ensemble GS models within a standardized and computationally efficient workflow. MultiGS is implemented through two complementary pipelines: MultiGS-R, a Java/R pipeline implementing 12 statistical and ML models, and MultiGS-P, a Python pipeline integrating 17 models including five linear models, three ML approaches, and nine recently developed DL architectures implemented within the framework. We benchmarked MultiGS using wheat, maize, and flax datasets representing contrasting prediction scenarios. Wheat and maize were evaluated using random training-test splits within the same population, reflecting suitable conditions for assessing model capacity and scalability. Under these scenarios, several DL, hybrid, and ensemble models achieved prediction accuracies comparable to RR-BLUP and consistently exceeded those of GBLUP. In contrast, the flax dataset represented a true across-population prediction scenario with limited training set size and strong population structure. In this challenging context, classical linear models provided stable baselines, while a subset of DL architectures--particularly graph-based models and BLUP-integrated hybrids--demonstrated comparatively improved generalization across populations. Comparisons with previously published DL tools showed that MultiGS models achieved comparable or improved prediction accuracies while requiring lower computational costs, enabling routine retraining and large-scale evaluation. Overall, MultiGS informs, scenario-specific model selection and provides a practical platform for deploying genomic prediction under realistic breeding conditions. The software is freely available on GitHub (https://github.com/AAFC-ORDC-Crop-Bioinfomatics/MultiGS).

10

Uncovering Superior Alleles and Genetic Loci for Yield-Related Traits in Mungbean (Vigna radiata L. Wilczek) Through Genome-Wide Association Study

Shahin Uz Zaman, M.; Iqbal, M. S.; Prodhan, A.; Alam, A. M.

2025-04-19 genomics 10.1101/2025.04.15.648935 medRxiv

Top 0.1%

31.0%

Show abstract

Mungbean is an important legume crop in South and Southeast Asia and Australia in terms of area coverage and production. However, productivity remains low due to limited genetic diversity, necessitating the dissection of the genetic basis of quantitatively inherited yield-related traits to develop stable and high-yielding varieties. In the current study, a total of 296 mungbean minicore germplasm accessions from the World Vegetable Centre were evaluated over three years in Bangladesh to assess their genetic diversity and local adaptation. Out of the 296 accessions, 206 produced flowers and yield, showing significant genetic variation in six yield-related traits: days to flowering (DF), days to maturity (DM), plant height (PH), pods per plant (PODS), 100-seed weight (HSW), and seed yield (YLD). Moderate to high broad-sense heritability was exhibited for all phenotypic traits, including DF (72%) and HSW (62%). Genome-wide association study (GWAS) was conducted using the 4,307 high-quality SNPs obtained from the genotyping by sequencing method and found 16 genetic loci across the six mungbean chromosomes associated with the six traits. Further, we selected 21 superior germplasm based on multi-trait stability index including four accessions with a higher number of favorable alleles (10). We also employed genomic prediction models and found a moderate prediction accuracy (>30%) for the HSW and YLD. These results will assist in incorporating important alleles into the elite mungbean germplasm through marker-assisted breeding and/or genomic prediction for improving mungbean yield.

11

Comparison of localGEBV and Optimal Haplotype Stacking Fitness Functions using a Novel R Package: HapSelect

Shaffer, W.; Papin, V.; Carter, Z.; Brunner, S. M.; Tong, J.; Villiers, K.; Robinson, H.; Voss-Fels, K.; Hayes, B. J.; Hickey, L.; Dinglasan, E.

2026-07-13 genetics 10.64898/2026.07.08.737160 medRxiv

Top 0.1%

30.7%

Show abstract

Haplotype-based breeding strategies have emerged as promising approaches to maximize long-term genetic gain by identifying complementary parental combinations while maintaining genetic diversity. However, these methods typically require phased genotypes and more intensive workflow pipelines and skillsets. We developed a novel local genomic estimated breeding value (localGEBV) fitness function with similar intent to the optimal haplotype stacking (OHS) framework fitness function and implemented both in the novel R package, HapSelect. Our aim was to evaluate whether phased haplotypes provide additional benefit over the more easily available dosage-based unphased genotypes in highly inbred crops. A subset of bread wheat nested association mapping (NAM) population comprising 444 lines genotyped with 6,054 DArT-Seq markers was analysed. Marker effects were estimated using rrBLUP, localGEBV and haplotype effects were calculated across linkage disequilibrium-defined haploblocks, and genetic algorithms (GA) were used to identify optimal sets of 30 founders using either a localGEBV derived fitness function with unphased, dosage inputs or the OHS fitness function with phased inputs. Selected parental sets were compared with conventional truncation selection (TS) through 150 generations of forward simulation. The OHS fitness function achieved a marginally greater optimized ultimate GEBV than the localGEBV fitness function during GA optimization, with only 18 of the 30 selected founders overlapped between the two methods. Despite these differences, forward simulations demonstrated nearly identical long-term genetic gain for localGEBV and OHS-selected founders, with both approaches outperforming conventional truncation selection by maintaining greater genetic diversity and delaying the genetic plateau. The minimal difference between localGEBV and OHS is likely attributable to the high homozygosity of the population, where localGEBV and haplotype effects are nearly confounded. These results demonstrate that dosage-based localGEBV provides a practical alternative to phased haplotype approaches for parent selection in inbred crops, substantially simplifying genomic workflows while maintaining long-term breeding performance. Future work should evaluate these methods in more diverse inbred populations and outbred species, where great haplotypic diversity may increase the advantage of true haplotype-based optimizations.

12

Benchmarking SNP-Calling Accuracy Against Known Citrus Pedigrees Reveals Pangenome Advantages Over Linear References

Kuster, R. D.; Sisler, P.; Sandhu, K.; Yin, L.; Niece, S.; Krueger, R.; Dardick, C.; Keremane, M.; Ramadugu, C.; Staton, M. E.

2026-04-09 genomics 10.64898/2026.04.07.716967 medRxiv

Top 0.1%

30.7%

Show abstract

BackgroundPangenomes are a promising new approach to genomics that can reduce reference bias in genotyping, but the reliability of such a data model remains unclear in tracking variation across species. To test the utility of graph-based pangenomes for interspecific breeding, we developed a Minigraph-Cactus super-pangenome representing four Citrus species derived from the founder lines of a citrus breeding program. To benchmark SNP calling accuracy using graph and linear-based approaches, we performed whole genome short read sequencing for two sets of pedigreed progeny: 30 F1 hybrids and 244 advanced hybrids from an F1 crossed with a parent not included in the pangenome. ResultsThe linear approach yielded more SNP calls than the graph-based approach, however, both methods exhibited similar Mendelian Inheritance Error Rates (MIER) in a tool-dependent manner. Reconstruction of parental haplotype blocks in the advanced hybrids revealed a striking improvement in performance in the pangenome graph-based calls, suggesting MIER is vulnerable to error when reference bias influences both parental and progeny genotype calls. Masking of regions diverged from the reference path improved MIER accuracy metrics and haplotype block reconstruction in both the linear and graph-based SNP calls. ConclusionsIn non-model systems, inheritance patterns observed from pedigreed hybrids provide a framework for benchmarking variant-calling accuracy using pangenomes. SNP miscalls originating from diverged regions can falsely satisfy MIER filters, thus we recommend haplotype blocks. The inherent structure of the pangenome graph has promising applications for removing regions of unreliable mapping quality, which cannot otherwise be reliably removed using traditional filtering metrics.

13

Sparse testcrossing for early-stage genomic prediction of general combining ability to increase genetic gain in maize hybrid breeding programs

Gonzalez-Dieguez, D. O.; Atlin, G. N.; Beyene, Y.; WEGARY, D.; Gemenet, D. C.; Werner, C. R.

2025-02-24 genomics 10.1101/2025.02.19.639156 medRxiv

Top 0.1%

30.6%

Show abstract

1Sparse testcrossing is an effective strategy for increasing both short- and long-term genetic gain in hybrid breeding programs. Maize hybrid breeding programs aim to develop new hybrid varieties by crossing genetically distinct parents from different heterotic pools, exploiting heterosis for improved performance. The programs typically consist of two main components: population improvement and product development. The population improvement component aims to enhance the heterotic pools through reciprocal recurrent selection based on general combining ability (GCA). However, especially in the early stages of testing, evaluating large numbers of hybrid combinations to estimate GCA is impractical due to considerable logistical challenges and costs. Therefore, breeders often evaluate the initial population of selection candidates using only a single tester to narrow down the candidate pool before further evaluation. Using a single tester, however, may not adequately represent the heterotic pool, leading to inaccurate GCA estimates and suboptimal selection decisions. To address this, we propose sparse testcrossing for early-stage testing, where subsets of candidate genotypes are testcrossed with different testers, connected through a genomic relationship matrix. We conducted stochastic simulations to compare various sparse testcrossing designs with a conventional testcross strategy using a single tester over 15 cycles of reciprocal recurrent genomic selection. Our results show that using 3-5 testers, sparsely distributed among full-sibs, sparse testcrossing offers breeders a practical balance between simple testcross designs, resource efficiency, and increased prediction accuracy for GCA, ultimately resulting in increased rates of genetic gain. Key messageSparse testcrossing with 3-5 testers enhances genetic gain in hybrid breeding programs, offering a practical balance of simple testcross designs, resource efficiency, and increased prediction accuracy for general combining ability.

14

Evaluating the breeding potential of cultivated lentils for increasing protein and amino acid concentration in the Northern Great Plains

Wright, D. M.; Hang, J.; House, J. D.; Bett, K. E.

2024-04-29 plant biology 10.1101/2024.04.26.591363 medRxiv

Top 0.1%

30.4%

Show abstract

The rising demand for plant-based proteins has intensified interest in pulse crops due to their high protein concentration. However, few studies have evaluated protein and amino acid composition/variability in cultivated lentil (Lens culinaris Medik.). We evaluated protein and amino acid composition using near-infrared reflectance spectroscopy (NIRS) in a diversity panel grown in four site-years in Saskatchewan, Canada, followed by genome-wide association analyses with phenology-related traits as covariates. We found little correlation between days from sowing to flowering, region of origin, cotyledon color, or seed size, and protein concentration. Reproductive period was correlated with protein concentration. We also observed large variability between environments and more variability within market classes than among them. Our results demonstrate the potential for breeders to identify germplasm and select for increased protein and amino acid concentration and quality using a high-throughput NIRS method. We were able to identify numerous molecular markers for use in marker-assisted breeding. Our approach could be replicated by breeders from other regions or with other pulse crops to help meet the demand for plant-based protein and improvements in protein quality.

15

DeepVariant calling provides insights into race diversity and its implication for sorghum breeding

Ruperao, P.; Gandham, P.; Odeny, D. A.; Selvanayagam, S.; Thirunavukkarasu, N.; Das, R. R.; Srikanda, M.; Gandhi, H.; Habyarimana, E.; Manyasa, E.; Nebie, B.; Deshpande, S. P.; Rathore, A.

2022-09-08 plant biology 10.1101/2022.09.06.505536 medRxiv

Top 0.1%

30.4%

Show abstract

Due to evolutionary divergence, sorghum race populations exhibit vast genetic and morphological variations. A k-mer-based sorghum race sequence comparison identified the conserved k-mers of all sorghum race accessions and the race-specific genetic signatures identified the gene variability in 10,321 genes (PAVs). To understand the sorghum race structure, diversity and domestication, deep learning-based variant calling approach was employed in a set of genotypic data derived from a diverse panel of 272 sorghum accessions. The data resulted in 1.7 million high-quality genome-wide SNPs and identified selective signature (both positive and negative) regions through a genome-wide scan with different (iHS and XP-EHH) statistical methods. We discovered 2,370 genes associated with selection signatures including 179 selective sweep regions distributed over 10 chromosomes. Localization of these regions undergoing selective pressure with previously reported QTLs and genes revealed that the signatures of selection could be related to the domestication of important agronomic traits such as biomass and plant height. The developed k-mer signatures will be useful in the future to identify the sorghum race and SNP markers assist in plant breeding programs.

16

Prioritization of Deleterious Mutations Improves Genomic Prediction and Increases the Rate of Genetic Gain in Common Bean (Phaseolus vulgaris L.), a Simulation Study

Cordoba Novoa, H. A.; Hoyos-Villegas, V.

2025-05-09 genetics 10.1101/2025.05.05.652208 medRxiv

Top 0.1%

30.3%

Show abstract

The study of mutations is fundamental to understanding evolution, domestication, and genetics. Characterizing mutations has the potential to accelerate breeding programs through selection and purging of deleterious mutations (DelMut). Here, we investigated how predicting DelMut in breeding populations can improve genomic prediction (GP) and inform strategies to increase the rate of genetic gain. DelMut were annotated in three independent common bean populations using a previously developed random forest (RF) model incorporating phylogenetic and protein information. Deleterious scores from the RF model were mostly around 0.25, with the top 1% (highly DelMut) of variants scoring between 0.78 - 0.82 among populations. All populations showed variation in the number of highly DelMut per line (max. 13 - 197) and in genetic load. We assessed the impact of incorporating a priori information for variant prioritization and weighting based on predicted deleteriousness in GP models for yield and flowering time. Stochastic simulations were conducted to evaluate how different mating schemes based variable numbers of DelMut per parent affect genetic gain. Variants with higher predicted scores had significantly different effect distributions compared to random or lower-scored markers. Yield predictions were 4.47-12.3% more accurate when markers were weighted by effect and deleterious score; no consistent improvement was observed for flowering time. Simulated breeding cycles showed that selecting parents with fewer highly DelMut consistently increases the rate of genetic gain. These results highlight the potential of DelMut information for variant prioritization and the optimization of common bean breeding programs. The approaches we developed can be assessed in other species to improve the efficacy of crop improvement. Key messages- Predicted deleterious mutations have different distributions of effects based on population composition. - Variant prioritization and differential weighing of markers based on effects and deleterious scores can improve the prediction of yield. - Favoring mating schemes between parents with fewer highly deleterious mutations can increase the rate of genetic gain.

17

Efficient genomic prediction at reduced training size and moderate marker density in an expanded aus-NAM population of rice

Kitony, J. K.; Reyes, V. P.; Sunohara, H.; Tasaki, M.; Yamasaki, M.; Mori, J.-i.; Shimazu, A.; Nishiuchi, S.; Michael, T. P.; Doi, K.

2026-05-01 plant biology 10.64898/2026.04.28.721500 medRxiv

Top 0.1%

30.2%

Show abstract

Genomic selection (GS) can accelerate genetic gain in crops, but its effectiveness depends on training population design and marker density. Nested association mapping (NAM) populations provide a structured framework that captures broad allelic diversity within a controlled genetic background. Here, we evaluated genomic prediction (GP) and genome-wide association study (GWAS) performance in an expanded aus-NAM population of rice comprising 1,818 recombinant inbred lines across 14 families and 11 agronomic traits, using genotyping-by-sequencing (GBS) markers and projected whole-genome sequence variants. Prediction accuracy plateaued at moderate marker densities ([~]20k SNPs) and with training populations of [~]500 lines ([~]40-60% of the available pool), with trait heritability emerging as the strongest determinant of predictive performance rather than model choice or marker density. In contrast, GWAS resolution continued to improve with increasing marker density, enabling detection of additional loci, including a chromosome 12 locus associated with heading date, while consistently recovering well-characterized genes such as EARLY HEADING DATE 1 (Ehd1) and SEMIDWARF 1 (SD1). These contrasting patterns indicate that GP reaches near-optimal performance once genome-wide variation is adequately represented, whereas GWAS benefits from higher marker density through improved locus resolution. The present study establishes a benchmark for implementing breeding programs involving japonica/indica crosses using GP in a single environment.

18

Machine Learning-GWAS reveals the role of WSD1 gene for cuticular wax ester biosynthesis and key genomic regions controlling early maturity in bread wheat

Tekeu, H.; Jean, M.; Ngonkeu, E. L. M.; Belzile, F.

2023-11-07 genomics 10.1101/2023.11.03.565125 medRxiv

Top 0.1%

30.0%

Show abstract

This study employed Machine Learning-Genome-Wide Association Study (ML-GWAS) to identify genomic regions linked to cuticular wax ester biosynthesis (SW) and early maturity (DM) in wheat. Using a dataset with 170 wheat accessions and 74K SNPs, four GWAS tools (MLM, CMLM, FarmCPU, and BLINK) and five machine learning techniques (RF, ANN, SVR, CNN, and SVM) were applied. A highly significant SW association was found on chromosome 1A, with the peak SNP (chr1A:556842331) explaining 50% of the phenotypic variation. A promising candidate gene, TraesCS1A01G385500, was identified as an ortholog of Arabidopsis thalianas WSD1 gene, which plays a crucial role in very long-chain (VLC) wax ester biosynthesis. For DM, four QTLs were detected on chromosomes 4B (two QTLs), 2A, and 5A. Haplotype analysis revealed that alleles TT significantly contribute to cuticular wax ester biosynthesis and early maturity in wheat varieties. The study underscores the superior performance of ML models, especially when combined with advanced multi-locus GWAS models like BLINK and FarmCPU, with significantly lower p-values for identifying relevant QTLs compared to traditional methods. ML approaches hold potential for revolutionizing the study of complex genetic traits, offering insights to enhance wheat crops resilience and quality. ML-GWAS emerges as a compelling tool for genomic-based breeding, enabling breeders to develop improved wheat varieties with greater precision and efficiency.

19

Overcoming barriers to the registration of new varieties

Yang, C. J.; Russell, J.; Ramsay, L.; Thomas, W.; Powell, W.; Mackay, I.

2020-10-09 genetics 10.1101/2020.10.08.331892 medRxiv

Top 0.1%

27.5%

Show abstract

Distinctness, Uniformity and Stability (DUS) is an intellectual property system introduced in 1961 by the International Union for the Protection of New Varieties of Plants (UPOV) for safeguarding the investment and rewarding innovation in developing new plant varieties. Despite the rapid advancement in our understanding of crop biology over the past 60 years, the DUS system has not changed and is still dependent upon a set of morphological traits for testing candidate varieties. As the demand for more plant varieties increases, the barriers to registration of new varieties become more acute and thus require urgent review to the system. To highlight the challenges and remedies in the current system, we evaluated a comprehensive panel of 805 UK barley varieties that span the entire history of DUS testing. Our findings reveal the system deficiencies and provide evidence for a shift towards a robust genomics enabled registration system for new crop varieties.

20

Genetic Gains from Sixty Years of Spring Wheat Breeding in the Northern Plains of the US

Gill, H. S.; Blecha, S.; Brault, C.; Glover, K.; Green, A.; Cook, J.; Lorenz, A.; Read, A. C.; Anderson, J. A.

2025-05-23 plant biology 10.1101/2025.05.21.655386 medRxiv

Top 0.1%

27.2%

Show abstract

Evaluating genetic gains over time is essential for assessing the success of breeding programs and refining strategies for ongoing improvement. Hard red spring (HRS) wheat is an important wheat class in the US and is primarily grown in the Northern Great Plains. Despite a long history of breeding efforts in this region, long-term quantification of genetic gains for key traits has remained limited. This study analyzes over sixty years of data from the USDA-coordinated Hard Red Spring Wheat Uniform Regional Nursery (HRSWURN) to evaluate genetic advancements in agronomic traits across multiple phases. A significant positive genetic gain of 0.61% per annum was observed for grain yield in HRS wheat released in the Northern US region, which is lower than the expected gains needed to meet future wheat demand. The change was 0.07% for test weight, -0.04% for days to heading, and -0.16% for plant height. Notably, sustained yield improvements have not affected grain protein levels since they were first measured in 1995, indicating that ongoing selection has effectively balanced grain yield and protein despite their negative correlation (r = -0.31). Assessment of genetic gains over 20-year phases suggested slowing rates of genetic gains for grain yield but did not indicate any plateaus. The realized genetic gains were generally higher for individual breeding programs when breeding for target environments, with the public breeding program in Minnesota observing gains of approximately 1% per annum. These findings highlight the significant impact of long-term breeding efforts and offer valuable insights for refining future breeding strategies.